Adaptive and hybrid context-aware fine-grained word sense disambiguation in topic modeling based document representation

نویسندگان

چکیده

Abstract We propose a hybrid context based topic model with an adaptive window length for word sense disambiguation in document representation. Document representation is essential part of various tasks, and to capture the distinctions senses Traditional methods mainly rely on knowledge libraries data enrichment; however, semantics division may vary different domain-specific datasets. aim discover finer-grained semantic differences, such as entities or standpoints, handle problem without enrichment. There are two challenges this task: (1) dividing each polysemous word, (2) preserving differences between synonyms. Most existing models either separate clusters integrating auxiliary module specify senses. They can hardly achieve both since assumed be independent their intrinsic relationships ignored. To solve problem, we introduce “Bag-of-Senses” (BoS) assumption: multiset senses, generated instead words. The estimated by which it occurs contexts its other occurrences. Besides, scopes related occurrence, variable adjust adaptively. Our experiments three standard datasets show that our proposal outperforms state-of-the-art terms estimation, modeling, classification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fine-Grained Word Sense Disambiguation Based on Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets

The paper presents a method for word sense disambiguation based on parallel corpora. The method exploits recent advances in word alignment and word clustering based on automatic extraction of translation equivalents and being supported by available aligned wordnets for the languages in the corpus. The wordnets are aligned to the Princeton Wordnet, according to the principles established by Euro...

متن کامل

A Topic Model for Word Sense Disambiguation

We develop latent Dirichlet allocation with WORDNET (LDAWN), an unsupervised probabilistic topic model that includes word sense as a hidden variable. We develop a probabilistic posterior inference algorithm for simultaneously disambiguating a corpus and learning the domains in which to consider each word. Using the WORDNET hierarchy, we embed the construction of Abney and Light (1999) in the to...

متن کامل

Improving Word Sense Disambiguation Using Topic Features

This paper presents a novel approach for exploiting the global context for the task of word sense disambiguation (WSD). This is done by using topic features constructed using the latent dirichlet allocation (LDA) algorithm on unlabeled data. The features are incorporated into a modified naı̈ve Bayes network alongside other features such as part-of-speech of neighboring words, single words in the...

متن کامل

Knowledge-based Word Sense Disambiguation using Topic Models

Word Sense Disambiguation is an open problem in Natural Language Processing which is particularly challenging and useful in the unsupervised setting where all the words in any given text need to be disambiguated without using any labeled data. Typically WSD systems use the sentence or a small window of words around the target word as the context for disambiguation because their computational co...

متن کامل

Document Clustering using Word Sense Disambiguation

In computational linguistics, word sense disambiguation (WSD) is the problem of determining in which sense a word having a number of distinct senses is used in a given sentence . This paper handles text document clustering as one of the major tasks of text processing. Document clustering is the process of finding out groups of information from the text documents and cluster these documents into...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information Processing and Management

سال: 2021

ISSN: ['0306-4573', '1873-5371']

DOI: https://doi.org/10.1016/j.ipm.2021.102592